50 resultados para Molecular Sequence Data

em National Center for Biotechnology Information - NCBI


Relevância:

100.00% 100.00%

Publicador:

Resumo:

Molecular and fragment ion data of intact 8- to 43-kDa proteins from electrospray Fourier-transform tandem mass spectrometry are matched against the corresponding data in sequence data bases. Extending the sequence tag concept of Mann and Wilm for matching peptides, a partial amino acid sequence in the unknown is first identified from the mass differences of a series of fragment ions, and the mass position of this sequence is defined from molecular weight and the fragment ion masses. For three studied proteins, a single sequence tag retrieved only the correct protein from the data base; a fourth protein required the input of two sequence tags. However, three of the data base proteins differed by having an extra methionine or by missing an acetyl or heme substitution. The positions of these modifications in the protein examined were greatly restricted by the mass differences of its molecular and fragment ions versus those of the data base. To characterize the primary structure of an unknown represented in the data base, this method is fast and specific and does not require prior enzymatic or chemical degradation.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The Plasmodium falciparum Genome Database (http://PlasmoDB.org) integrates sequence information, automated analyses and annotation data emerging from the P.falciparum genome sequencing consortium. To date, raw sequence coverage is available for >90% of the genome, and two chromosomes have been finished and annotated. Data in PlasmoDB are organized by chromosome (1–14), and can be accessed using a variety of tools for graphical and text-based browsing or downloaded in various file formats. The GUS (Genomics Unified Schema) implementation of PlasmoDB provides a multi-species genomic relational database, incorporating data from human and mouse, as well as P.falciparum. The relational schema uses a highly structured format to accommodate diverse data sets related to genomic sequence and gene expression. Tools have been designed to facilitate complex biological queries, including many that are specific to Plasmodium parasites and malaria as a disease. Additional projects seek to integrate genomic information with the rich data sets now becoming available for RNA transcription, protein expression, metabolic pathways, genetic and physical mapping, antigenic and population diversity, and phylogenetic relationships with other apicomplexan parasites. The overall goal of PlasmoDB is to facilitate Internet- and CD-ROM-based access to both finished and unfinished sequence information by the global malaria research community.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The release of vast quantities of DNA sequence data by large-scale genome and expressed sequence tag (EST) projects underlines the necessity for the development of efficient and inexpensive ways to link sequence databases with temporal and spatial expression profiles. Here we demonstrate the power of linking cDNA sequence data (including EST sequences) with transcript profiles revealed by cDNA-AFLP, a highly reproducible differential display method based on restriction enzyme digests and selective amplification under high stringency conditions. We have developed a computer program (GenEST) that predicts the sizes of virtual transcript-derived fragments (TDFs) of in silico-digested cDNA sequences retrieved from databases. The vast majority of the resulting virtual TDFs could be traced back among the thousands of TDFs displayed on cDNA-AFLP gels. Sequencing of the corresponding bands excised from cDNA-AFLP gels revealed no inconsistencies. As a consequence, cDNA sequence databases can be screened very efficiently to identify genes with relevant expression profiles. The other way round, it is possible to switch from cDNA-AFLP gels to sequences in the databases. Using the restriction enzyme recognition sites, the primer extensions and the estimated TDF size as identifiers, the DNA sequence(s) corresponding to a TDF with an interesting expression pattern can be identified. In this paper we show examples in both directions by analyzing the plant parasitic nematode Globodera rostochiensis. Various novel pathogenicity factors were identified by combining ESTs from the infective stage juveniles with expression profiles of ∼4000 genes in five developmental stages produced by cDNA-AFLP.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Of the approximately 380 families of angiosperms, representatives of only 10 are known to form symbiotic associations with nitrogen-fixing bacteria in root nodules. The morphologically based classification schemes proposed by taxonomists suggest that many of these 10 families of plants are only distantly related, engendering the hypothesis that the capacity to fix nitrogen evolved independently several, if not many, times. This has in turn influenced attitudes toward the likelihood of transferring genes responsible for symbiotic nitrogen fixation to crop species lacking this ability. Phylogenetic analysis of DNA sequences for the chloroplast gene rbcL indicates, however, that representatives of all 10 families with nitrogen-fixing symbioses occur together, with several families lacking this association, in a single clade. This study therefore indicates that only one lineage of closely related taxa achieved the underlying genetic architecture necessary for symbiotic nitrogen fixation in root nodules.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Insects in the order Plecoptera (stoneflies) use a form of two-dimensional aerodynamic locomotion called surface skimming to move across water surfaces. Because their weight is supported by water, skimmers can achieve effective aerodynamic locomotion even with small wings and weak flight muscles. These mechanical features stimulated the hypothesis that surface skimming may have been an intermediate stage in the evolution of insect flight, which has perhaps been retained in certain modern stoneflies. Here we present a phylogeny of Plecoptera based on nucleotide sequence data from the small subunit rRNA (18S) gene. By mapping locomotor behavior and wing structural data onto the phylogeny, we distinguish between the competing hypotheses that skimming is a retained ancestral trait or, alternatively, a relatively recent loss of flight. Our results show that basal stoneflies are surface skimmers, and that various forms of surface skimming are distributed widely across the plecopteran phylogeny. Stonefly wings show evolutionary trends in the number of cross veins and the thickness of the cuticle of the longitudinal veins that are consistent with elaboration and diversification of flight-related traits. These data support the hypothesis that the first stoneflies were surface skimmers, and that wing structures important for aerial flight have become elaborated and more diverse during the radiation of modern stoneflies.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

The EMBL Nucleotide Sequence Database (http://www.ebi.ac.uk/embl/) is maintained at the European Bioinformatics Institute (EBI) in an international collaboration with the DNA Data Bank of Japan (DDBJ) and GenBank at the NCBI (USA). Data is exchanged amongst the collaborating databases on a daily basis. The major contributors to the EMBL database are individual authors and genome project groups. Webin is the preferred web-based submission system for individual submitters, whilst automatic procedures allow incorporation of sequence data from large-scale genome sequencing centres and from the European Patent Office (EPO). Database releases are produced quarterly. Network services allow free access to the most up-to-date data collection via ftp, email and World Wide Web interfaces. EBI’s Sequence Retrieval System (SRS), a network browser for databanks in molecular biology, integrates and links the main nucleotide and protein databases plus many specialized databases. For sequence similarity searching a variety of tools (e.g. Blitz, Fasta, BLAST) are available which allow external users to compare their own sequences against the latest data in the EMBL Nucleotide Sequence Database and SWISS-PROT.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Woody Sonchus and five related genera (Babcockia, Taeckholmia, Sventenia, Lactucosonchus, and Prenanthes) of the Macaronesian islands have been regarded as an outstanding example of adaptive radiation in angiosperms. Internal transcribed spacer region of the nuclear rDNA (ITS) sequences were used to demonstrate that, despite the extensive morphological and ecological diversity of the plants, the entire alliance in insular Macaronesia has a common origin. The sequence data place Lactucosonchus as sister group to the remainder of the alliance and also indicate that four related genera are in turn sister groups to subg. Dendrosonchus and Taeckholmia. This implies that the woody members of Sonchus were derived from an ancestor similar to allied genera now present on the Canary Islands. It is also evident that the alliance probably occurred in the Canary Islands during the late Miocene or early Pliocene. A rapid radiation of major lineages in the alliance is consistent with an unresolved polytomy near the base and low ITS sequence divergence. Increase of woodiness is concordant with other insular endemics and refutes the relictural nature of woody Sonchus in the Macaronesian islands.

Relevância:

100.00% 100.00%

Publicador:

Resumo:

Amino acid sequencing by recombinant DNA technology, although dramatically useful, is subject to base reading errors, is indirect, and is insensitive to posttranslational processing. Mass spectrometry techniques can provide molecular weight data from even relatively large proteins for such cDNA sequences and can serve as a check of an enzyme's purity and sequence integrity. Multiply-charged ions from electrospray ionization can be dissociated to yield structural information by tandem mass spectrometry, providing a second method for gaining additional confidence in primary sequence confirmation. Here, accurate (+/- 1 Da) molecular weight and molecular ion dissociation information for human muscle and brain creatine kinases has been obtained by electrospray ionization coupled with Fourier-transform mass spectrometry to help distinguish which of several published amino acid sequences for both enzymes are correct. The results herein are consistent with one published sequence for each isozyme, and the heterogeneity indicated by isoelectric focusing due to 1-Da deamidation changes. This approach appears generally useful for detailed sequence verification of recombinant proteins.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The Mycetozoa include the cellular (dictyostelid), acellular (myxogastrid), and protostelid slime molds. However, available molecular data are in disagreement on both the monophyly and phylogenetic position of the group. Ribosomal RNA trees show the myxogastrid and dictyostelid slime molds as unrelated early branching lineages, but actin and β-tubulin trees place them together as a single coherent (monophyletic) group, closely related to the animal–fungal clade. We have sequenced the elongation factor-1α genes from one member of each division of the Mycetozoa, including Dictyostelium discoideum, for which cDNA sequences were previously available. Phylogenetic analyses of these sequences strongly support a monophyletic Mycetozoa, with the myxogastrid and dictyostelid slime molds most closely related to each other. All phylogenetic methods used also place this coherent Mycetozoan assemblage as emerging among the multicellular eukaryotes, tentatively supported as more closely related to animals + fungi than are green plants. With our data there are now three proteins that consistently support a monophyletic Mycetozoa and at least four that place these taxa within the “crown” of the eukaryote tree. We suggest that ribosomal RNA data should be more closely examined with regard to these questions, and we emphasize the importance of developing multiple sequence data sets.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Molecular, sequence-based environmental surveys of microorganisms have revealed a large degree of previously uncharacterized diversity. However, nearly all studies of the human endogenous bacterial flora have relied on cultivation and biochemical characterization of the resident organisms. We used molecular methods to characterize the breadth of bacterial diversity within the human subgingival crevice by comparing 264 small subunit rDNA sequences from 21 clone libraries created with products amplified directly from subgingival plaque, with sequences obtained from bacteria that were cultivated from the same specimen, as well as with sequences available in public databases. The majority (52.5%) of the directly amplified 16S rRNA sequences were <99% identical to sequences within public databases. In contrast, only 21.4% of the sequences recovered from cultivated bacteria showed this degree of variability. The 16S rDNA sequences recovered by direct amplification were also more deeply divergent; 13.5% of the amplified sequences were more than 5% nonidentical to any known sequence, a level of dissimilarity that is often found between members of different genera. None of the cultivated sequences exhibited this degree of sequence dissimilarity. Finally, direct amplification of 16S rDNA yielded a more diverse view of the subgingival bacterial flora than did cultivation. Our data suggest that a significant proportion of the resident human bacterial flora remain poorly characterized, even within this well studied and familiar microbial environment.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

The function of many of the uncharacterized open reading frames discovered by genomic sequencing can be determined at the level of expressed gene products, the proteome. However, identifying the cognate gene from minute amounts of protein has been one of the major problems in molecular biology. Using yeast as an example, we demonstrate here that mass spectrometric protein identification is a general solution to this problem given a completely sequenced genome. As a first screen, our strategy uses automated laser desorption ionization mass spectrometry of the peptide mixtures produced by in-gel tryptic digestion of a protein. Up to 90% of proteins are identified by searching sequence data bases by lists of peptide masses obtained with high accuracy. The remaining proteins are identified by partially sequencing several peptides of the unseparated mixture by nanoelectrospray tandem mass spectrometry followed by data base searching with multiple peptide sequence tags. In blind trials, the method led to unambiguous identification in all cases. In the largest individual protein identification project to date, a total of 150 gel spots—many of them at subpicomole amounts—were successfully analyzed, greatly enlarging a yeast two-dimensional gel data base. More than 32 proteins were novel and matched to previously uncharacterized open reading frames in the yeast genome. This study establishes that mass spectrometry provides the required throughput, the certainty of identification, and the general applicability to serve as the method of choice to connect genome and proteome.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Chromosome 7q22 has been the focus of many cytogenetic and molecular studies aimed at delineating regions commonly deleted in myeloid leukemias and myelodysplastic syndromes. We have compared a gene-dense, GC-rich sub-region of 7q22 with the orthologous region on mouse chromosome 5. A physical map of 640 kb of genomic DNA from mouse chromosome 5 was derived from a series of overlapping bacterial artificial chromosomes. A 296 kb segment from the physical map, spanning Ache to Tfr2, was compared with 267 kb of human sequence. We identified a conserved linkage of 12 genes including an open reading frame flanked by Ache and Asr2, a novel cation-chloride cotransporter interacting protein Cip1, Ephb4, Zan and Perq1. While some of these genes have been previously described, in each case we present new data derived from our comparative sequence analysis. Adjacent unfinished sequence data from the mouse contains an orthologous block of 10 additional genes including three novel cDNA sequences that we subsequently mapped to human 7q22. Methods for displaying comparative genomic information, including unfinished sequence data, are becoming increasingly important. We supplement our printed comparative analysis with a new, Web-based program called Laj (local alignments with java). Laj provides interactive access to archived pairwise sequence alignments via the WWW. It displays synchronized views of a dot-plot, a percent identity plot, a nucleotide-level local alignment and a variety of relevant annotations. Our mouse–human comparison can be viewed at http://web.uvic.ca/~bioweb/laj.html. Laj is available at http://bio.cse.psu.edu/, along with online documentation and additional examples of annotated genomic regions.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

When many protein sequences are available for estimating the time of divergence between two species, it is customary to estimate the time for each protein separately and then use the average for all proteins as the final estimate. However, it can be shown that this estimate generally has an upward bias, and that an unbiased estimate is obtained by using distances based on concatenated sequences. We have shown that two concatenation-based distances, i.e., average gamma distance weighted with sequence length (d2) and multiprotein gamma distance (d3), generally give more satisfactory results than other concatenation-based distances. Using these two distance measures for 104 protein sequences, we estimated the time of divergence between mice and rats to be approximately 33 million years ago. Similarly, the time of divergence between humans and rodents was estimated to be approximately 96 million years ago. We also investigated the dependency of time estimates on statistical methods and various assumptions made by using sequence data from eubacteria, protists, plants, fungi, and animals. Our best estimates of the times of divergence between eubacteria and eukaryotes, between protists and other eukaryotes, and between plants, fungi, and animals were 3, 1.7, and 1.3 billion years ago, respectively. However, estimates of ancient divergence times are subject to a substantial amount of error caused by uncertainty of the molecular clock, horizontal gene transfer, errors in sequence alignments, etc.

Relevância:

90.00% 90.00%

Publicador:

Resumo:

There is a need for faster and more sensitive algorithms for sequence similarity searching in view of the rapidly increasing amounts of genomic sequence data available. Parallel processing capabilities in the form of the single instruction, multiple data (SIMD) technology are now available in common microprocessors and enable a single microprocessor to perform many operations in parallel. The ParAlign algorithm has been specifically designed to take advantage of this technology. The new algorithm initially exploits parallelism to perform a very rapid computation of the exact optimal ungapped alignment score for all diagonals in the alignment matrix. Then, a novel heuristic is employed to compute an approximate score of a gapped alignment by combining the scores of several diagonals. This approximate score is used to select the most interesting database sequences for a subsequent Smith–Waterman alignment, which is also parallelised. The resulting method represents a substantial improvement compared to existing heuristics. The sensitivity and specificity of ParAlign was found to be as good as Smith–Waterman implementations when the same method for computing the statistical significance of the matches was used. In terms of speed, only the significantly less sensitive NCBI BLAST 2 program was found to outperform the new approach. Online searches are available at http://dna.uio.no/search/

Relevância:

90.00% 90.00%

Publicador:

Resumo:

Carbonic anhydrase (CA) (EC 4.2.1.1) enzymes catalyze the reversible hydration of CO2, a reaction that is important in many physiological processes. We have cloned and sequenced a full-length cDNA encoding an intracellular β-CA from the unicellular green alga Coccomyxa. Nucleotide sequence data show that the isolated cDNA contains an open reading frame encoding a polypeptide of 227 amino acids. The predicted polypeptide is similar to β-type CAs from Escherichia coli and higher plants, with an identity of 26% to 30%. The Coccomyxa cDNA was overexpressed in E. coli, and the enzyme was purified and biochemically characterized. The mature protein is a homotetramer with an estimated molecular mass of 100 kD. The CO2-hydration activity of the Coccomyxa enzyme is comparable with that of the pea homolog. However, the activity of Coccomyxa CA is largely insensitive to oxidative conditions, in contrast to similar enzymes from most higher plants. Fractionation studies further showed that Coccomyxa CA is extrachloroplastic.